Distributed Multinomial Regression
نویسنده
چکیده
This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our motivating applications are in text analysis, where documents are tokenized and the token counts are modeled as arising from a multinomial dependent upon document attributes. We estimate such models for a publicly available data set of reviews from Yelp, with text regressed onto a large set of explanatory variables (user, business, and rating information). The fitted models serve as a basis for exploring the connection between words and variables of interest, for reducing dimension into supervised factor scores, and for prediction. We argue that the approach herein provides an attractive option for social scientists and other text analysts who wish to bring familiar regression tools to bear on text data.
منابع مشابه
Multinomial logit - Wikipedia, the free encyclopedia
In statistics, a multinomial logit (MNL) model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes.[1] That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which...
متن کاملDS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Multinomial logistic regression is a popular tool in the arsenal of machine learning algorithms, yet scaling it to datasets with very large number of data points and classes has not been trivial. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient ...
متن کاملDistributed training of Large-scale Logistic models
Regularized Multinomial Logistic regression has emerged as one of the most common methods for performing data classification and analysis. With the advent of large-scale data it is common to find scenarios where the number of possible multinomial outcomes is large (in the order of thousands to tens of thousands) and the dimensionality is high. In such cases, the computational cost of training l...
متن کاملMultitarget Tracking of Distributed Targets Using Histogram-PMHT
It has been shown previously that the Histogram Probabilistic Multi-Hypothesis Tracking (HPMHT) algorithm enables tracking of targets distributed across several sensor cells. Two related aspects of the H-PMHT algorithm are discussed in this paper, namely the estimation of target spread, and the use of the negative multinomial distribution to compensate for unobserved sensor cells. The examples ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015